WikiWars: A New Corpus for Research on Temporal Expressions
نویسندگان
چکیده
The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2 tags. The corpus contains around 120000 tokens, and 2600 TIMEX2 expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.
منابع مشابه
WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions
Temporal information plays an important role in many natural language processing and understanding tasks. Therefore, the extraction and normalization of temporal expressions from documents are crucial preprocessing steps in these research areas, and several temporal taggers have been developed in the past. The quality of such temporal taggers is usually evaluated using annotated corpora as gold...
متن کاملTemporal expression normalisation in natural language texts
Automatic annotation of temporal expressions is a research challenge of great interest in the field of information extraction. In this report, I describe a novel rule-based architecture, built on top of a preexisting system, which is able to normalise temporal expressions detected in English texts. Gold standard temporally-annotated resources are limited in size and this makes research difficul...
متن کاملRecognising and Interpreting Named Temporal Expressions
This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typ...
متن کاملAutomatic Extraction of Time Expressions Accross Domains in French Narratives
The prevalence of temporal references across all types of natural language utterances makes temporal analysis a key issue in Natural Language Processing. This work adresses three research questions: 1/is temporal expression recognition specific to a particular domain? 2/if so, can we characterize domain specificity? and 3/how can subdomain specificity be integrated in a single tool for unified ...
متن کاملSupervised Recognition of
This paper reports research on temporal expressions shaped by a common temporal expression for a period of years modified by an adverb of time. From a Spanish corpus we found that some of those phrases are agerelated expressions. To determine automatically the temporal phrases with such meaning we analyzed a bigger sample obtained from the Internet. We analyzed these examples to define the rele...
متن کامل